Informative Gene Selection Method Based on Symmetric Uncertainty and SVM Recursive Feature Elimination
YE Mingquan1, GAO Lingyun1, WU Changrong2, WAN Chunyuan1
1.Department of Computer Science, Wannan Medical College, Wuhu 241002 2.School of Mathematics and Computer Science, Anhui Normal University, Wuhu 241003
Abstract:A large number of genes unrelated to tumor classification exist in gene expression profiles, and thus the prediction accuracy of tumor is reduced substantially. Due to the small sample size with high dimension and noise, the tumor diagnosis is harder. To get an informative gene subset with fewer genes and a better classification accuracy, an informative gene selection method based on symmetric uncertainty(SU) and support vector machine-recursive feature elimination(SVM-RFE) is proposed. Firstly, SU is used to evaluate the correlation between genes and class labels, and approximate Markov blanket is defined grounded on SU. The elimination of irrelevant and redundant genes is achieved. Secondly, SVM-RFE algorithm is applied to obtain the effective informative gene subset by further removing redundant genes. Experimental results show that the proposed algorithm produces higher classification performance with equal or less informative gene subset.
[1] BOLN-CANEDO V, SNCHEZ-MAROO N, ALONSO-BETANZOS A, et al. A Review of Microarray Datasets and Applied Feature Selection Methods. Information Sciences, 2014, 282: 111-135. [2] APOLLONI J, LEGUIZAMN G, ALBA E. Two Hybrid Wrapper-Filter Feature Selection Algorithms Applied to High-Dimensional Microarray Experiments. Applied Soft Computing, 2016, 38: 922-932. [3] 谢娟英,高红超.基于统计相关性与K-means 的区分基因子集选择算法.软件学报, 2014, 25(9): 2050-2075. (XIE J Y, GAO H C. Statistical Correlation and K-means Based Distinguishable Gene Subset Selection Algorithms. Journal of Software, 2014, 25(9): 2050-2075.) [4] 张 靖,胡学钢,李培培,等.基于迭代Lasso的肿瘤分类信息基因选择方法研究.模式识别与人工智能, 2014, 27(1): 49-59. (ZHANG J, HU X G, LI P P, et al. Informative Gene Selection for Tumor Classification Based on Iterative Lasso. Pattern Recognition and Artificial Intelligence, 2014, 27(1): 49-59.) [5] 董红斌,滕旭阳,杨 雪.一种基于关联信息熵度量的特征选择方法.计算机研究与发展, 2016, 53(8): 1684-1695. (DONG H B, TENG X Y, YANG X. Feature Selection Based on the Measurement of Correlation Information Entropy. Journal of Computer Research and Development, 2016, 53(8): 1684-1695.) [6] CAO J, ZHANG L, WANG B J, et al. A Fast Gene Selection Method for Multi-cancer Classification Using Multiple Support Vector Data Description. Journal of Biomedical Informatics, 2015, 53: 381-389. [7] HUANG L K, ZHANG H H, ZENG Z B, et al. Improved Sparse Multi-class SVM and Its Application for Gene Selection in Cancer Classification. Cancer Informatics, 2013, 12: 143-153. [8] LATKOWSKI T, OSOWSKI S. Data Mining for Feature Selection in Gene Expression Autism Data. Expert Systems with Applications, 2015, 42(2): 864-872. [9] HSU H H, HSIEH C W, LU M D. Hybrid Feature Selection by Combining Filters and Wrappers. Expert Systems with Applications, 2011, 38(7): 8144-8150. [10] CADENAS J M, GARRIDO M C, MARTNEZ R. Feature Subset Selection Filter-Wrapper Based on Low Quality Data. Expert Systems with Applications, 2013, 40(16): 6241-6252. [11] VANITHA C D A, DEVARAJ D, VENKATESULU M. Gene Expression Data Classification Using Support Vector Machine and Mutual Information-Based Gene Selection. Procedia Computer Science, 2015, 47: 13-21. [12] VERGARA J R, ESTVEZ P A. A Review of Feature Selection Methods Based on Mutual Information. Neural Computing and Applications, 2015, 24(1): 175-186. [13] 崔自峰,徐宝文,张卫丰,等.一种近似Markov Blanket最优特征选择算法.计算机学报, 2007, 30(12): 2074-2081. (CUI Z F, XU B W, ZHANG W F, et al. An Approximate Mar-kov Blanket Feature Selection Algorithm. Chinese Journal of Computers, 2007, 30(12): 2074-2081.) [14] YU L, LIU H. Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research, 2004, 5: 1205-1224. [15] ALI S I, SHAHZAD W. A Feature Subset Selection Method Based on Symmetric Uncertainty and Ant Colony Optimization // Proc of the International Conference on Emerging Technologies. Washington, USA: IEEE, 2012. DOI: 16.11.09/ICET.2012.6375420. [16] LIU Y, ZHANG J H, MA L. A Fault Diagnosis Approach for Diesel Engines Based on Self-adaptive WVD, Improved FCBF and PECOC-RVM. Neurocomputing, 2016, 177: 600-611. [17] PIAO Y J, PIAO M H, PARK K J, et al. An Ensemble Correlation-Based Gene Selection Algorithm for Cancer Classification with Gene Expression Data. Bioinformatics, 2012, 28(24): 3306-3315. [18] 谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法.计算机学报, 2014, 37(8): 1704-1718. (XIE J Y, XIE W X. Several Feature Selection Algorithms Based on the Discernibility of a Feature Subset and Support Vector Machines. Chinese Journal of Computers, 2014, 37(8): 1704-1718.) [19] GUYON I, WESTON J, BARNHILL S, et al. Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 2002, 46(1): 389-422. [20] DUAN K B, RAJAPAKSE J C, WANG H Y, et al. Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data. IEEE Transactions on Nanobioscience, 2005, 4(3): 228-234. [21] TANG Y C, ZHANG Y Q, HUANG Z. Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007, 4(3): 365-381. [22] ZHOU X, TUCK D P. MSVM-RFE: Extensions of SVM-RFE for Multiclass Gene Selection on DNA Microarray Data. Bioinformatics, 2007, 23(9): 1106-1114. [23] YOON S, KIM S. Mutual Information-Based SVM-RFE for Diagnostic Classification of Digitized Mammograms. Pattern Recognition Letters, 2009, 30(16): 1489-1495. [24] MUNDRA P A, RAJAPAKSE J C. SVM-RFE with MRMR Filter for Gene Selection. IEEE Transactions on Nanobioscience, 2010, 9(1): 31-37. [25] 林 俊,许 露,刘 龙.基于SVM-RFE-BPSO算法的特征选择方法.小型微型计算机系统, 2015, 36(8): 1865-1868. (LIN J, XU L, LIU L. Feature Selection Based on SVM-RFE and Particle Swarm Optimization. Journal of Chinese Computer Systems, 2015, 36(8): 1865-1868.) [26] TAPIA E, BULACIO P, ANGELONE L. Sparse and Stable Gene Selection with Consensus SVM-RFE. Pattern Recognition Letters, 2012, 33(2): 164-172. [27] HIDALGO-MUOZ A R, LPEZ M, SANTOS I M, et al. Application of SVM-RFE on EEG Signals for Detecting the Most Relevant Scalp Regions Linked to Affective Valence Processing. Expert Systems with Applications, 2013, 40(6): 2102-2108. [28] WITTEN I H, FRANK E, HALL M A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd Edition. San Francisco, USA: Morgan Kaufmann, 2011.
[29] MEENAKSHI M, GEETIKA G. Survey on Classification Methods Using WEKA. International Journal of Computer Applications, 2014, 86(18): 16-19.